Tumor gene expression data classification via sample expansion-based deep learning
نویسندگان
چکیده
Since tumor is seriously harmful to human health, effective diagnosis measures are in urgent need for tumor therapy. Early detection of tumor is particularly important for better treatment of patients. A notable issue is how to effectively discriminate tumor samples from normal ones. Many classification methods, such as Support Vector Machines (SVMs), have been proposed for tumor classification. Recently, deep learning has achieved satisfactory performance in the classification task of many areas. However, the application of deep learning is rare in tumor classification due to insufficient training samples of gene expression data. In this paper, a Sample Expansion method is proposed to address the problem. Inspired by the idea of Denoising Autoencoder (DAE), a large number of samples are obtained by randomly cleaning partially corrupted input many times. The expanded samples can not only maintain the merits of corrupted data in DAE but also deal with the problem of insufficient training samples of gene expression data to a certain extent. Since Stacked Autoencoder (SAE) and Convolutional Neural Network (CNN) models show excellent performance in classification task, the applicability of SAE and 1-dimensional CNN (1DCNN) on gene expression data is analyzed. Finally, two deep learning models, Sample Expansion-Based SAE (SESAE) and Sample Expansion-Based 1DCNN (SE1DCNN), are designed to carry out tumor gene expression data classification by using the expanded samples. Experimental studies indicate that SESAE and SE1DCNN are very effective in tumor classification.
منابع مشابه
Feature Selection and Classification of Microarray Gene Expression Data of Ovarian Carcinoma Patients using Weighted Voting Support Vector Machine
We can reach by DNA microarray gene expression to such wealth of information with thousands of variables (genes). Analysis of this information can show genetic reasons of disease and tumor differences. In this study we try to reduce high-dimensional data by statistical method to select valuable genes with high impact as biomarkers and then classify ovarian tumor based on gene expression data of...
متن کاملDeep Unsupervised Domain Adaptation for Image Classification via Low Rank Representation Learning
Domain adaptation is a powerful technique given a wide amount of labeled data from similar attributes in different domains. In real-world applications, there is a huge number of data but almost more of them are unlabeled. It is effective in image classification where it is expensive and time-consuming to obtain adequate label data. We propose a novel method named DALRRL, which consists of deep ...
متن کاملClassification of Brain Tumor by Combination of Pre-Trained VGG16 CNN
In recent years, brain tumors become the leading cause of death in the world. Detection and rapid classification of this tumor are very important and may indicate the likely diagnosis and treatment strategy. In this paper, we propose deep learning techniques based on the combinations of pre-trained VGG-16 CNNs to classify three types of brain tumors (i.e., meningioma, glioma, and pituitary tumo...
متن کاملDown-regulation of HSP40 gene family following OCT4B1 suppression in human tumor cell lines
Objective(s): The OCT4B1, as one of OCT4 variants, is expressed in cancer cell lines and tissues more than other variants and plays an important role in apoptosis and stress (heat shock protein) pathways. The present study was designed to determine the effects of OCT4B1 silencing on expressional profile of HSP40 gene family expression in three different human tumor cell lines. Materials and Met...
متن کاملImage Classification via Sparse Representation and Subspace Alignment
Image representation is a crucial problem in image processing where there exist many low-level representations of image, i.e., SIFT, HOG and so on. But there is a missing link across low-level and high-level semantic representations. In fact, traditional machine learning approaches, e.g., non-negative matrix factorization, sparse representation and principle component analysis are employed to d...
متن کامل